All Questions
Tagged with scikit-learnpandas
146 questions
4votes
0answers
23views
Trying to train ML model to do regression for US Department of Transportation Kaggle Flights Dataset with 5 million records and 7 features
For a college project for my data science course I am trying to fit a model based on the U.S. DOT's 2015 Kaggle Flight Cancellations dataset, but am not having great luck with model performance (MSE ...
2votes
0answers
20views
Preprocessing multivalue attributes in a dataframe, similar to Nominal
Description: Input is a CSV file CSV file contains columns of different data types: Ordinal Values, Nominal Values, Numerical Values and Multi Value For the multivalue columns. Minimum is 1, ...
3votes
1answer
31views
Looking to replace missing time series values with values from a competitor that's correlated
I have a dataset of a retailer that has the following attributes Date, Hour, Enters, Exits I have another dataset with the same attributes of a competitor that is correlated with the original dataset ...
1vote
1answer
45views
RFECV and grid search - what sets to use for hyperparameter tuning?
I am running machine learning models (all with sci-kit learn estimators, no neural networks) using a custom dataset with a number of features and binomial output. I first split the dataset into 0.6 (...
2votes
1answer
71views
Why lightgbm .predict function has proba not between 0 and 1
I wanna understand why in this code, I get the following results: ...
1vote
0answers
35views
How to predict price behavior according to model predictions for a week ahead?
I wrote the simplest linear regression model (I'm a noob, please don't scold me; this is my first model) to predict the price of solana, I would like to get some advice or tips on how to improve. The ...
0votes
1answer
550views
Python SK-Learn KNN Imputer ( "ValueError: could not convert string to float: )
I have data with missing values. All columns are integer, except for a column that has missing values. These missing values, were set with a "?" which was converted to NaN using the Numpy ...
0votes
1answer
31views
Best measure to inform how predicted value can differ from real one
I have trained a regression model and obtained a pandas series of the predicted values. I am working on a "calculator" that will be able to return a predicted value after entering an input ...
0votes
1answer
261views
Ugly AUC curves. Sklearn. How to make AUC Curves less square
I dislike the square look of this AUC curve (SKLearn). The purpose of this question is "visual". Please post code snippets. This question is not requesting the theory behind the AUC. My goal ...
0votes
1answer
1kviews
Is there any benefit to using cross validation from the XGBoost library over sklearn when tuning hyperparameters?
The XGBoost library has its own implementation of cross validation through xgboost.cv(). It looks like it requires data be stored as a DMatrix. Instead of using <...
0votes
2answers
2kviews
Does Scikit-Learn's OneHotEncoder make all Columns Categorical?
I've been using Scikit-Learn's OneHotEncoder to turn categorical data into binary columns, however, it seems that fitting ...
0votes
1answer
164views
What Should I do with NaN Values when Transforming Categorial Data to Numerical Data?
I'm currently working on a dataset with several categorial features, which I need to transform into numerical features through one hot encoding. However, some of the features have ...
0votes
1answer
953views
Linear Regression line not showing in plot
It's a silly problem, I know, but it's getting my nerves. Everything seems fine, but I cannot get the line to show on the plot. I've put it in a public Google notebook, for your convenience. t ...
0votes
1answer
675views
calculate sklearn metrics from 2d array
I have the following frame of actual value, ...
0votes
0answers
70views
why when I find the best accuracy for logistic regression then it give me this error (AttributeError: split not found)
after run this code I face the split not found error. ...